FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-Based CNN Architecture
نویسندگان
چکیده
In this paper we address the problem of semantic labeling of indoor scenes on RGB-D data. With the availability of RGB-D cameras, it is expected that additional depth measurement will improve the accuracy. Here we investigate a solution how to incorporate complementary depth information into a semantic segmentation framework by making use of convolutional neural networks (CNNs). Recently encoder-decoder type fully convolutional CNN architectures have achieved a great success in the field of semantic segmentation. Motivated by this observation we propose an encoder-decoder type network, where the encoder part is composed of two branches of networks that simultaneously extract features from RGB and depth images and fuse depth features into the RGB feature maps as the network goes deeper. Comprehensive experimental evaluations demonstrate that the proposed fusion-based architecture achieves competitive results with the state-of-the-art methods on the challenging SUN RGB-D benchmark obtaining 76.27% global accuracy, 48.30% average class accuracy and 37.29% average intersectionover-union score.
منابع مشابه
Incorporating Depth into both CNN and CRF for Indoor Semantic Segmentation
To improve segmentation performance, a novel neural network architecture (termed DFCN-DCRF) is proposed, which combines an RGB-D fully convolutional neural network (DFCN) with a depth-sensitive fully-connected conditional random field (DCRF). First, a DFCN architecture which fuses depth information into the early layers and applies dilated convolution for later contextual reasoning is designed....
متن کاملDepth-aware CNN for RGB-D Segmentation
Convolutional neural networks (CNN) are limited by the lack of capability to handle geometric information due to the fixed grid kernel structure. The availability of depth data enables progress in RGB-D semantic segmentation with CNNs. State-of-the-art methods either use depth as additional images or process spatial information in 3D volumes or point clouds. These methods suffer from high compu...
متن کاملDeep Projective 3D Semantic Segmentation
Semantic segmentation of 3D point clouds is a challenging problem with numerous real-world applications. While deep learning has revolutionized the field of image semantic segmentation, its impact on point cloud data has been limited so far. Recent attempts, based on 3D deep learning approaches (3DCNNs), have achieved below-expected results. Such methods require voxelizations of the underlying ...
متن کاملMapping Stacked Decision Forests to Deep and Sparse Convolutional Neural Networks for Semantic Segmentation
We consider the task of pixel-wise semantic segmentation given a small set of labeled training images. Among two of the most popular techniques to address this task are Random Forests (RF) and Neural Networks (NN). The main contribution of this work is to explore the relationship between two special forms of these techniques: stacked RFs and deep Convolutional Neural Networks (CNN). We show tha...
متن کاملRelating Cascaded Random Forests to Deep Convolutional Neural Networks for Semantic Segmentation
We consider the task of pixel-wise semantic segmentation given a small set of labeled training images. Among two of the most popular techniques to address this task are Random Forests (RF) and Neural Networks (NN). The main contribution of this work is to explore the relationship between two special forms of these techniques: stacked RFs and deep Convolutional Neural Networks (CNN). We show tha...
متن کامل